%\section{Related and Future Work}\label{sec:future}
\section{Related and Future Work}\label{sec:future}

In contrast to vast existing research on sentiment analysis and topic analysis in social media, and in contrast with generation tasks in which the input is artificial or pre-defined, our system implements a full end-to-end cycle from natural language analysis to natural language generation with applications in social media and in supporting interaction in real-world settings.

%
%A natural place to begin the improvement of our response generation system is by improving the
A natural contact point of our work with much existing work in social media analysis is the investigation of how a
change in   the methods  implementing  individual components  (topic inference, sentiment scoring) %and implementing a closer integration between them,
would affect the result of the overall generation. In particular, it would be interesting to
 %develop a single component for topic+sentiment analysis  that would infer distinct sentiment levels for  topics in a topic model distribution, and see if such
 test whether a novel mechanism for joint inference of topic/sentiment distributions could lead to improvement in the  appropriateness or human-likeness of responses. 

%Moreover, our topic model prediction provides a rich form of input (probability distribution over words, according to a mix of topics), from  which we currently  isolate a top-ranked keyword that represents the most prominent topic. Clearly, the generation can refer to multiple topics, or integrate multiple keywords from a certain topic. This  would improve the grounding, and subsequently, the relevance, of the generated responses.

%The latter idea, integrating multiple keywords from a certain topic in the generation is not as trivial as may initially seen, because different keywords may belong to different part of speech categories and may be syntactically ambiguous. Initial exploration we have conducted suggests that  topic model for training and prediction integrating  part of speech tags (and possibly, named entities and base phrases) is a promising step in this direction.
%Following the development and experimentation a few lessons where identified. These can be roughly divided to technical challenges and more fundamental, theoretical insights. Both bring forth ideas and pointers for future work.

%Likewise, our sentiment analysis component  uses a general-purpose implementation that calculates a single sentiment for the entire document.

%Upon devising a  fine-grained and tightly connected implementations of topic modeling and sentiment scoring, the most interesting challenge still stands out: to effectively generate rich, human-like, responses.

The syntactic and semantic means  we use for generating grammatical relations are based on bare bone templates \cite{Theune:2001:DSG:973927.973930}. These  can be expanded with different ways to express   subject/object-relations, connectivity between phrases, polarity of sentences, and so on. Additional approaches to generation that can factor in such aspects are template-based methods as in  \newcite{becker-2002} and \newcite{conf/aiide/NarayanIR11}, or  grammar based methods, as in \newcite{DeVault:2008:PGN:1708322.1708338}. Using more sophisticated generation methods with a full-fledge grammatical component may  combat the discovery of typical computer-generated response patterns, as was displayed by our human raters.
%In order to change the familiarity of the overall response structure imposed by the template, we suggest to use a  grammar-based mechanism that prescribes %Natural language is usually rich in scope. One of the challenge that we
%encountered was difficulty to generate rich sentences. We were limited to very
%specific response structure with limited way to express the subject / object and
%meaning. Initially we believed that the topic models would allow us to be more
%flexible in the response but that proved to be more difficult than expected as
%there is little semantic knowledge in a topic model.

Furthermore, our result concerning the human-likeness of \(g_{\rm kb}\) clearly demonstrates that semantic knowledge must be brought in to support better, and more human-like,
generation.
%In particular, building a categorized topic model (based on, e.g.,  a part-of-speech tagged datasets) will allow widening the scope of
%the generated response by identifying topics as `entities'  (nouns) and `actions' (verbs). This, along  with
Wide-coverage ontologies/taxonomies such as freebase %\footnote{\url{http://www.freebase.com}} 
support many semantic tasks \cite{Jacobs:1985:KAL:894296}, and   can  be used for  providing richer context  for response generation.

%Having a more complex knowledge system then calls for a more refined sentiment analysis. While our implementation used a very rough sentiment analysis tool, a finer analysis can result in a more precise responses. Instead of WSD it is possible to train a sentiment classifier on the subject matter and thus get a better response; alternatively, use WSD but with modifications to account for negations, and maybe include additional features, such as n-grams, in the scoring. Furthermore, we applied sentiment analysis on the whole document. Analyzing at finer granularity can result in a response that addresses several aspect of the text and thus, we believe, more effective.

%Thirdly,  template-based generation oftentimes requires relatively tedious work of devising the templates and aggregating the functions. This process makes   it difficult to achieve richly structured responses --- for instance,  it is hard to synchronize the agreement between the various parts of the sentence, which thwarts the use of additional rhetorical constructions such as interrogative types, tenses and person/voice (e.g., passive) changes and phrase extra-posing. We plan to address these challenges using more robust template-based systems such as those by \newcite{becker-2002} and \newcite{conf/aiide/NarayanIR11} or grammar-based systems.

From a theoretical viewpoint, the system  will clearly benefit from  rigorous analysis of human interaction in
online media. Responses to user-generated content on the Internet share some linguistic characteristics in structure, length and manner of expression. Studying these features theoretically and then examining them empirically using a Turing-like evaluation  as presented here, can take us a big step towards better generation, and also better understanding of the human process of generating  responses.

This latter understanding may  be complemented with insights into the causes, motivations and
intricacies of human interaction in such environments, as studied by sociologists and psychologists. %This makes this research truly interdisciplinary.
%Collaboration with communication scientists, sociologists and psychologists that study interactivity and social media would therefore be a great asset.
 In paticular, 
our preliminary interaction with colleagues from communication studies suggests that the present 
endeavor nicely complements that of ``persuasive computing'' \cite{Fogg:1998:PCP:274644.274677,Fogg:2002:PTU:764008.763957}, and we hope that this collaboration will lead to valuable synergies between the two threads of research.
%Actually, initial exploration of practice and theory from the communications fields already present some new approaches for future exploration.

Bridging the gap between the technical and the theoretical, it
would be fascinating to test the responses in the context for which they are
generated -- social media. 
%While our methodology provides us with a good way of scrutinizing factors affecting human-likeness, we believe that  a real-world task-based evaluation would take us closer to our ultimate goal ---
Generated texts may be posted as a response to the original article, or shared with a link of the original article, followed by measuring the responses to, and shares of, that response. Such real-world evaluation could indicate that generated responses are  indeed  believable and engaging, and they better simulate a Turing-like test in which generated responses  cannot be distinguished from human responses. 

